Search CORE

195 research outputs found

Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models

Author: Schuemie Martijn J.
Suchard Marc A.
Yang Jianxiao
Publication venue
Publication date: 24/10/2023
Field of study

The Cox proportional hazards model stands as a widely-used semi-parametric approach for survival analysis in medical research and many other fields. Numerous extensions of the Cox model have further expanded its versatility. Statistical computing challenges arise, however, when applying many of these extensions with the increasing complexity and volume of modern observational health datasets. To address these challenges, we demonstrate how to employ massive parallelization through graphics processing units (GPU) to enhance the scalability of the stratified Cox model, the Cox model with time-varying covariates, and the Cox model with time-varying coefficients. First we establish how the Cox model with time-varying coefficients can be transformed into the Cox model with time-varying covariates when using discrete time-to-event data. We then demonstrate how to recast both of these into a stratified Cox model and identify their shared computational bottleneck that results when evaluating the now segmented partial likelihood and its gradient with respect to regression coefficients at scale. These computations mirror a highly transformed segmented scan operation. While this bottleneck is not an immediately obvious target for multi-core parallelization, we convert it into an un-segmented operation to leverage the efficient many-core parallel scan algorithm. Our massively parallel implementation significantly accelerates model fitting on large-scale and high-dimensional Cox models with stratification or time-varying effect, delivering an order of magnitude speedup over traditional central processing unit-based implementations

arXiv.org e-Print Archive

Massive Parallelization of Massive Sample-size Survival Analysis

Author: Schuemie Martijn J.
Suchard Marc A.
Yang Jianxiao
Publication venue
Publication date: 18/04/2022
Field of study

Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival regression models in such studies. In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size survival analyses. Specifically, we develop and apply time- and memory-efficient single-pass parallel scan algorithms for Cox proportional hazards models and forward-backward parallel scan algorithms for Fine-Gray models for analysis with and without a competing risk using a cyclic coordinate descent optimization approach We demonstrate that GPUs accelerate the computation of fitting these complex models in large databases by orders-of-magnitude as compared to traditional multi-core CPU parallelism. Our implementation enables efficient large-scale observational studies involving millions of patients and thousands of patient characteristics

arXiv.org e-Print Archive

Anni 2.0: a multipurpose text-mining tool for the life sciences

Author: Dorssers Lambert CJ
Jelier Rob
Jenster Guido
Kors Jan A
Schuemie Martijn J
Veldhoven Antoine
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Anni 2.0 provides an ontology-based interface to MEDLINE

Lirias

Crossref

Springer - Publisher Connector

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Rewriting and suppressing UMLS terms for improved biomedical term identification

Author: Hettne Kristina M
Kors Jan A
Schijvenaars Bob JA
Schuemie Martijn J
van Mulligen Erik M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule. Results Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus. Conclusions We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at <url>http://biosemantics.org/casper</url>.</p

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

Recommended from our members

Comparison of First-Line Dual Combination Treatments in Hypertension: Real-World Evidence from Multinational Heterogeneous Cohorts.

Author: Cho Jaehyeong
Hripcsak George
Jung Sungjae
Lee Seongwon
Park Rae Woong
Park Sungha
Ryan Patrick B
Schuemie Martijn J
Suchard Marc A
Swerdel Joel N
You Seng Chan
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Background and objectives: 2018 ESC/ESH Hypertension guideline recommends 2-drug combination as initial anti-hypertensive therapy. However, real-world evidence for effectiveness of recommended regimens remains limited. We aimed to compare the effectiveness of first-line anti-hypertensive treatment combining 2 out of the following classes: angiotensin-converting enzyme (ACE) inhibitors/angiotensin-receptor blocker (A), calcium channel blocker (C), and thiazide-type diuretics (D).Methods: Treatment-naïve hypertensive adults without cardiovascular disease (CVD) who initiated dual anti-hypertensive medications were identified in 5 databases from US and Korea. The patients were matched for each comparison set by large-scale propensity score matching. Primary endpoint was all-cause mortality. Myocardial infarction, heart failure, stroke, and major adverse cardiac and cerebrovascular events as a composite outcome comprised the secondary measure.Results: A total of 987,983 patients met the eligibility criteria. After matching, 222,686, 32,344, and 38,513 patients were allocated to A+C vs. A+D, C+D vs. A+C, and C+D vs. A+D comparison, respectively. There was no significant difference in the mortality during total of 1,806,077 person-years: A+C vs. A+D (hazard ratio [HR], 1.08; 95% confidence interval [CI], 0.97-1.20; p=0.127), C+D vs. A+C (HR, 0.93; 95% CI, 0.87-1.01; p=0.067), and C+D vs. A+D (HR, 1.18; 95% CI, 0.95-1.47; p=0.104). A+C was associated with a slightly higher risk of heart failure (HR, 1.09; 95% CI, 1.01-1.18; p=0.040) and stroke (HR, 1.08; 95% CI, 1.01-1.17; p=0.040) than A+D.Conclusions: There was no significant difference in mortality among A+C, A+D, and C+D combination treatment in patients without previous CVD. This finding was consistent across multi-national heterogeneous cohorts in real-world practice

eScholarship - University of California

Using the data quality dashboard to improve the ehden network

Author: Blacketer Clair
Defalco Frank
Hughes Nigel
Moinat Maxim
Rijnbeek Peter R.
Schuemie Martijn J.
Voss Erica A.
Publication venue
Publication date: 01/12/2021
Field of study

Federated networks of observational health databases have the potential to be a rich resource to inform clinical practice and regulatory decision making. However, the lack of standard data quality processes makes it difficult to know if these data are research ready. The EHDEN COVID-19 Rapid Collaboration Call presented the opportunity to assess how the newly developed open-source tool Data Quality Dashboard (DQD) informs the quality of data in a federated network. Fifteen Data Partners (DPs) from 10 different countries worked with the EHDEN taskforce to map their data to the OMOP CDM. Throughout the process at least two DQD results were collected and compared for each DP. All DPs showed an improvement in their data quality between the first and last run of the DQD. The DQD excelled at helping DPs identify and fix conformance issues but showed less of an impact on completeness and plausibility checks. This is the first study to apply the DQD on multiple, disparate databases across a network. While study-specific checks should still be run, we recommend that all data holders converting their data to the OMOP CDM use the DQD as it ensures conformance to the model specifications and that a database meets a baseline level of completeness and plausibility for use in research.</p

ZENODO

Directory of Open Access Journals

EUR Research Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY